-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[config reload] Fix config reload failure due to sonic.target job cancellation #1814
Conversation
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@rajendra-dendukuri, please review |
@@ -686,7 +686,7 @@ def _stop_services(): | |||
pass | |||
|
|||
click.echo("Stopping SONiC target ...") | |||
clicommon.run_command("sudo systemctl stop sonic.target") | |||
clicommon.run_command("sudo systemctl stop sonic.target --job-mode replace-irreversibly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this option also for systemctl restart sonic.target case at line#709
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added !
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekreddynv could you please confirm this PR can be cherry picked to 202106 and 202012 cleanly? if not please create separated PRs. |
Verified Manually. The changes can be cleanly cherry-picked to 202012 & 202106 |
…cellation (#1814) #### What I did Fixes sonic-net/sonic-buildimage#7508
Where can I get the updated bin install file that comes with this fix? I would like to try it out on the Mellanox 2700 I was working with and having issues configuring due to this bug. |
The submodule update for this repo has been raised here: sonic-net/sonic-buildimage#8741. Once this gets merged you can get the image from the artifacts available here. https://dev.azure.com/mssonic/build/_build or once the CI is completed, you can use those artifacts here: |
Wonderful thanks for the info. May I ask what the eta is on the merger? No hurry I am just curious. Thanks again for your response. |
@v-wfarris there are might be some issues with that fix...so probably we will need yet another PR |
@@ -706,7 +706,7 @@ def _reset_failed_services(): | |||
|
|||
def _restart_services(): | |||
click.echo("Restarting SONiC target ...") | |||
clicommon.run_command("sudo systemctl restart sonic.target") | |||
clicommon.run_command("sudo systemctl restart sonic.target --job-mode replace-irreversibly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivekreddynv please double check this change
@stepanblyschak please elaborate on the consequences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vivekreddynv It looks like if restart is executed with --job-mode replace-irreversibly
we will still have the same issue, because the start job will be placed in the in systemd's job queue as "replace-irreverisbly" and the next config reload
stop job will be discarded by systemd due to "replace-irreverisbly" of the start job leading to an error that looks smth like this:
Transaction for sonic.target/stop is destructive
Could you please double check? I think we only need to stop services with the guaranty that it will be successfully executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the restart job (and all the other dependent jobs) are finished and is taken out of systemd's job queue, i don't think the next stop job will be cancelled.
However, simultaneous config reloads in the quick succession, can lead to the behavior you have said.
admin@sonic:~$ sudo systemctl restart sonic.target --job-mode replace-irreversibly
admin@sonic:~$ sudo systemctl stop sonic.target --job-mode replace-irreversibly (Ran Immediately)
Failed to stop sonic.target: Transaction for sonic.target/stop is destructive (ntp-config.service has 'start' job queued, but 'stop' is included in transaction).
See system logs and 'systemctl status sonic.target' for details.
(After all the dependent jobs of restart sonic.target are done, it works)
admin@sonic:~$ sudo systemctl stop sonic.target --job-mode replace-irreversibly
admin@sonic:
The problem here is that the sonic.target start is not a blocking call unlike sonic.target stop.
Nevertheless, the idea of this PR is that config reload should not fail and thus it makes sense to remove this. I'll raise a separate PR
…cellation (#1814) #### What I did Fixes sonic-net/sonic-buildimage#7508
#### What I did As discussed in this PR #1814 (comment), only the stop.job should have job-mode set to replace irreversibly. Otherwise, simultaneous config reloads in the quick succession, can lead to the behavior. Although ,when the restart job (and all the other dependent jobs) are finished and is taken out of systemd's job queue, the next stop job will not be cancelled.
#### What I did As discussed in this PR #1814 (comment), only the stop.job should have job-mode set to replace irreversibly. Otherwise, simultaneous config reloads in the quick succession, can lead to the behavior. Although ,when the restart job (and all the other dependent jobs) are finished and is taken out of systemd's job queue, the next stop job will not be cancelled.
#### What I did As discussed in this PR #1814 (comment), only the stop.job should have job-mode set to replace irreversibly. Otherwise, simultaneous config reloads in the quick succession, can lead to the behavior. Although ,when the restart job (and all the other dependent jobs) are finished and is taken out of systemd's job queue, the next stop job will not be cancelled.
* d03ba4f [202012] [portstat, intfstat] added rates and utilization (sonic-net#1812) * 499ad3f [config reload] Fix config reload failure due to sonic.target job cancellation (sonic-net#1814) * 96d658c [202012][sonic installer] Add swap setup support (sonic-net#1815) * a9c6970 platform pre-check for reboot in 202012 branch (sonic-net#1788) * 0e0478b Unify the number format in the ourput of portstat and pfcstat in all cases (sonic-net#1795) * 2d1e00e [ecnconfig] Fix exception seen during display and add unit tests (sonic-net#1784) (sonic-net#1789) Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
#### What I did As discussed in this PR sonic-net/sonic-utilities#1814 (comment), only the stop.job should have job-mode set to replace irreversibly. Otherwise, simultaneous config reloads in the quick succession, can lead to the behavior. Although ,when the restart job (and all the other dependent jobs) are finished and is taken out of systemd's job queue, the next stop job will not be cancelled.
Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com
What I did
Fixes sonic-net/sonic-buildimage#7508
How I did it
How to verify it
With this change:
Without this Change:
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)